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ABSTRACT 


This  thesis  investigates  optimizing  the  speed  of  computation  for  computing 
the  Choi-Williams  distribution.  The  Choi-Williams  distribution  is  a  way  of 
simultaneously  representing  a  signal  in  both  the  time  and  frequency  domains  in  a 
fashion  that  makes  it  possible  to  extract  the  waveform  parameters  of  the  signal. 
The  Choi-Williams  distribution  is  particularly  useful  for  analyzing  low  probability  of 
intercept  signals  for  electronic  intelligence  applications.  The  usefulness  of  the 
distribution  is  directly  correlated  to  the  speed  of  computation.  This  thesis 
examines  methods  in  which  the  Choi-Williams  distribution  can  be  modified  to 
increase  the  speed  of  computation  while  still  maintaining  its  ability  to  provide  a 
clear  picture  of  the  signal  characteristics.  By  eliminating  the  computation  of  near 
zero  terms  of  the  Choi-Williams  kernel  function,  the  speed  of  computation  can  be 
increased  dramatically  while  still  preserving,  and  improving,  the  time-frequency 
characteristics.  The  optimizations  developed  in  this  thesis  reduced  the  time  to 
compute  a  512  sample  CWD  from  6.9  seconds,  to  0.0466  seconds  on  an  Intel 
chip,  Linux  based  PC — an  increase  in  speed  of  147X. 
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EXECUTIVE  SUMMARY 


Low  Probability  of  Intercept  (LPI)  emitters  are  the  next  evolution  in  radar 
technology  and  represent  a  growing  threat  to  operating  forces,  civilian  targets, 
and  national  security.  The  operational  implementation  of  time-frequency  signal 
processing  techniques  such  as  the  Choi-Williams  distribution  (CWD)  are  useful 
for  the  detection  and  classification  of  these  threat  waveforms.  The  techniques, 
however,  are  currently  limited  due  to  non-real-time  computational  constraints. 
Developing  a  system  that  can  detect  and  classify  LPI  emitters  in  real  time  is  the 
first  step  in  developing  countermeasures  to  this  growing  threat. 

This  thesis  investigates  developing  a  highly  optimized  algorithm  for 
computing  the  Choi-Williams  distribution.  Using  a  C  code  implementation  as  a 
benchmark,  an  increase  of  over  100X  in  speed  of  computation  was  achieved.  It 
is  shown  that  the  optimizations  also  produce  a  clearer  and  more  accurate  picture 
of  the  input  signal  characteristics.  The  algorithm  developed  for  this  thesis  can  be 
ported  to  any  platform.  Many  of  the  optimizations  developed  can  be  applied  to 
the  computation  of  the  Wigner-Ville  distribution  as  well.  This  work  is  highly 
applicable  in  the  effort  to  develop  LPI  emitter  signal  detection  and  classification 
techniques. 

There  are  currently  several  signal  processing  algorithms,  which  are  able  to 
provide  time/frequency  representations  of  an  incoming  signal.  These  signal 
processing  techniques  are  useful  tools  in  the  detection  and  classification  of  LPI 
emitters;  however,  the  long  processing  time  it  takes  to  use  these  techniques 
prevents  these  techniques  from  being  used  in  a  real  time  system.  The  goal  then 
is  to  decrease  the  processing  time  to  compute  these  algorithms. 

The  CWD  is  notable  in  its  ability  to  provide  a  clear,  human-readable,  time- 
frequency  picture  of  an  LPI  signal  that  is  simple  to  classify.  The  ability  to 
compute  the  Choi-Williams  algorithm  in  real  or  near  real  time  will  be  a  significant 
contribution  in  developing  countermeasures  to  weapons  systems  utilizing  LPI 
emitters. 


Several  optimizations  are  presented  which  can  be  used  to  modify  the 
Boasash  algorithm,  reducing  the  complexity  and  number  of  computations  made. 
The  cumulative  effect  of  utilizing  these  optimizations  produces  a  hundred  fold 
increase  in  speed  of  computation  of  the  Choi-Williams  distribution.  Some  of  the 
optimizations  made  can  also  be  used  to  increase  the  speed  at  which  the  Wigner- 
Ville  distribution  (another  signal  processing  algorithm)  can  be  computed.  These 
optimizations  also  have  the  unintended  effect  of  providing  a  clearer  and  more 
accurate  picture  of  an  LPI  signal. 

To  compare  the  final  code  (pipe.c)  to  the  original  code  (choi.c  from 
[5])  five  test  runs  were  conducted  for  both  the  compiler  un-optimized  and 
compiler-optimized  versions  of  both  programs.  The  trial  runs  were  conducted  on 
an  Intel  chip,  Linux  based  PC.  Table  1  shows  the  results. 


From 

[5] 

From  this  work 

choi .  c 

choi .  c 

(compiler 

optimized) 

pipe .  c 

pipe .  c 

(compiler 

optimized) 

Trail  1 

46.81 

6.860 

0.05442 

0.04699 

Trial  2 

46.51 

6.887 

0.05445 

0.04655 

Trial  3 

46.50 

6.852 

0.05457 

0.04640 

Trial  4 

46.23 

6.872 

0.05470 

0.04631 

Trial  5 

46.49 

6.824 

0.05426 

0.04661 

Average 

46.51 

6.859 

.05448 

.046572 

Table  1 .  Time  in  seconds  of  trial  runs. 


The  optimized  version  of  the  code  produced  an  854X  increase  in  speed 
over  the  original  code.  The  optimized  version  of  the  code  that  was  compiled 
using  an  optimizing  compiler  produced  a  147X  increase  in  speed  over  the 
compiler-optimized  version  of  the  original  code. 
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I.  INTRODUCTION 


A.  CHOI-WILLIAMS  TIME-FREQUENCY  DISTRIBUTION  FOR 

ELECTRONIC  WARFARE 

Low  Probability  of  Intercept  (LPI)  emitters  are  the  next  evolution  in  radar 
technology  and  represent  a  growing  threat  to  operating  forces,  civilian  targets, 
and  national  security.  The  development  of  techniques  to  automatically  detect 
and  classify  LPI  emitters  is  of  extremely  high  importance.  Currently,  US  and 
allied  electronic  warfare  (EW)  systems  cannot  detect  or  classify  most  of  the  new 
types  of  LPI  radar  and  communication  systems  that  are  currently  being 
developed  and  deployed  by  adversary  nations.  LPI  systems  utilize  techniques 
such  as  frequency  hopping  and  direct  sequence  spread  spectrum  to  avoid 
detection.  As  more  and  more  of  these  systems  are  deployed  by  potential 
adversaries,  U.S.  and  allied  forces  will  be  at  greater  and  greater  risk  from  high¬ 
speed,  sea-skimming,  anti-ship  cruise  missiles  and  other  weapons. 

There  are  currently  several  signal  processing  algorithms  that  are  able  to 
provide  time-frequency  representations  of  an  incoming  signal.  These  signal 
processing  techniques  are  useful  tools  in  the  detection  and  classification  of  LPI 
emitters;  however,  the  long  processing  time  it  takes  to  use  these  techniques 
prevents  these  techniques  from  being  used  in  a  real  time  system.  The  goal  then 
is  to  decrease  the  processing  time  to  compute  these  algorithms. 

The  Choi-Williams  distribution  (CWD)  [1]  is  one  of  Cohen’s  generalized 
class  of  time-frequency  distributions  [2],  which  also  includes  the  Wigner-Ville 
distribution  and  the  spectrogram.  Cohen’s  time-frequency  distributions  have  been 
the  focus  of  extensive  research  and  study.  The  CWD  uses  an  exponential  kernel 
to  reduce  the  magnitude  of  cross  terms.  This  is  an  improvement  over  the 
Wigner-Ville  distribution,  which  simply  has  a  kernel  function  of  one  and  produces 
large  cross  terms  that  can  obscure  the  real  signal.  However,  much  of  the 
research  done  to  improve  the  speed  of  computation  of  the  Wigner-Ville 
distribution  is  applicable  to  computing  the  CWD,  as  well  as  other  time-frequency 

1 


distributions  based  on  Cohen’s  generalized  class.  In  particular  the  efficient 
algorithm  developed  by  Boasash  and  Black  [3],  has  been  key  in  an  effort  to 
improve  the  speed  of  computation  of  the  CWD  using  reconfigurable  computers 
[4],  [5].  Other  research  has  been  done  to  compute  the  CWD  using  parallel 
processing  [6],  [7],  Cardoso  et  al.  [6],  computed  the  CWD  using  a  parallel 
implementation  on  a  transputer  platform  with  five  processors.  They  were  able  to 
compute  the  CWD  for  1024  samples  in  20.96  ms.  In  [7],  Barry  used  matrix 
manipulation  techniques  to  provide  an  intuitive  approach  that,  when  combined 
with  parallel  processing,  will  improve  the  processing  speed  to  allow  real-time 
calculations  of  the  CWD. 

Other  efforts  to  improve  the  computation  speed  include  the  use  of 
instantaneous  frequency  [8],  and  the  fast  Hartley  transform  [9]  in  lieu  of  a  Fast 
Fourier  Transform  (FFT).  In  [8],  Jones  and  Boasash  investigated  the  spread  of  a 
signal  about  its  instantaneous  frequency  for  several  common  time-frequency 
distributions.  Their  efforts  led  to  an  adaptive  algorithm  that  may  be  used  to 
improve  the  time-frequency  resolution  of  multicomponent  signals.  In  [9], 
Narayanan  and  Prabhu  found  that  computing  Wigner-Ville  distribution  using  the 
Fast  Hartley  method  was  much  faster  than  using  an  FFT  method.  This  due  to 
considerably  less  multiplications  and  additions  needed  to  compute  the  Fast 
Hartley  method. 

Additional  research  in  this  field  has  been  conducted  in  comparing  the 
resolution  of  various  time-frequency  distributions  [10],  and  the  development  of 
new  tools  for  the  interpretation  and  quantitative  comparisons  of  high-resolution 
time-frequency  distributions  [11].  In  [10],  it  was  found  that  the  Choi-Williams 
distribution  is  the  most  attractive  for  signals  in  which  all  components  have 
constant  frequency  content.  In  [11],  Cunningham  and  Williams  introduce  some 
new  tools  for  the  interpretation  and  quantitative  comparison  of  high-resolution 
time-frequency  distributions. 

The  CWD  is  notable  in  its  ability  to  provide  a  clear,  visual,  time  frequency 

picture  of  an  LPI  signal  that  is  simple  to  classify.  The  ability  to  compute  the  Choi- 
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Williams  algorithm  in  real-  or  near  real-time  will  be  a  significant  contribution  in 
developing  intercept  receivers  for  LPI  signal  detection. 

B.  RESEARCH  OBJECTIVES 

The  intent  of  this  thesis  work  was  to  improve  the  speed  with  which  the 
CWD  can  be  computed  on  the  SRC-6  reconfigurable  supercomputer.  Some 
initial  work  on  this  has  been  done  [4],  with  limited  results.  The  approach  taken 
was  to  first  study  what  computations  were  being  made  by  the  computer,  then 
examine  how  to  best  format  the  computations  to  take  advantage  of  the  SRC-6’s 
reprogrammable  hardware. 

C.  APPROACH  AND  PRINCIPLE  CONTRIBUTIONS 

In  studying  the  algorithm  used  (an  implementation  of  the  Boasash 
algorithm),  it  was  discovered  that  many  multiplications  by  zero  were  taking  place. 
The  algorithm  was  rewritten  to  eliminate  these  multiplications.  Also,  it  was 
discovered  that  in  the  original  algorithm,  many  of  the  same  calculations  were 
being  done  multiple  times,  such  as  computing  of  the  weighting  function  (which 
never  changes)  N  times  where  N  is  the  number  of  signal  samples. 

Next,  it  was  discovered  that  the  conjugate  symmetry  of  the  distribution 
required  only  half  of  the  distribution  to  be  computed,  then  copied  into  the  other 
half  with  a  change  of  sign  for  the  imaginary  portion  of  the  complex  number. 
Eventually  a  method  was  found  to  fill  the  other  half  of  the  distribution  with  zeros. 
It  was  at  this  time  that  other  ways  of  representing  the  kernel  (weighting)  function 
were  investigated  to  see  if  using  powers  of  two  instead  of  the  exponential 
function  would  allow  easier  computation.  This  manipulation  was  determined  to 
not  be  useful  for  the  C  programming  language,  but  was  included  in  this  thesis 
because  it  could  be  useful  when  developing  a  hardware  solution  to  these 
algorithms. 

When  looking  at  three-dimensional  plots  of  the  kernel  (weighting)  function, 
it  was  noted  that  a  very  high  proportion  of  the  weights  were  near  zero,  and  that 
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these  weights  contributed  very  little  to  the  final  result  of  the  computation.  A  “cut 
and  slice”  method  was  developed  to  test  various  ways  of  stripping  out 
computations  that,  while  taking  as  much  time  as  the  other  computations,  would 
make  insignificant  contributions  to  the  final  result  of  the  distribution.  It  was  found 
that  only  a  fraction  of  the  computations  need  be  computed  to  still  produce 
excellent  results.  These  new  modifications  were  coded  into  the  original  C  code 
and  very  significant  speed  gains  were  made. 

It  was  also  found  that  stripping  out  the  low  (near  zero)  weighted 
contributions  to  the  sum  would  make  it  possible  to  reduce  the  number  of 
computations  needed  to  compute  the  FFT.  An  algorithm  was  developed  for  the 
generalized  Butterfly  Machine  (BFM),  which  reduced  the  layers  of  BFMs  needed 
for  the  FFT.  This  algorithm  is  directly  applicable  to  the  computation  of  zero- 
padded  FFTs.  The  FFT  from  the  benchmark  code  [5],  written  by  Prof. 
Breitenbach,  was  modified  with  his  permission  to  incorporate  these  new  changes 
and  the  total  computation  time  of  the  CWD  was  again  significantly  reduced. 

Next,  an  analysis  was  done  to  objectively  determine  the  affects  of 
eliminating  the  computation  of  near  zero  terms  in  the  weighted  summations.  As 
would  be  expected,  there  is  a  strong  correlation  between  the  number  of 
computations  eliminated  and  the  departure  in  results  from  the  unaltered  CWD. 
However,  in  many  cases,  the  time-frequency  picture  appears  smoother  in  the 
CWD  implemented  using  the  “cut  and  slice”  method  than  using  the  unaltered 
CWD.  For  a  signal  that  is  changing  frequencies  with  time  (such  as  an  LPI 
signal),  contributions  to  the  frequency  at  time  tby  a  sample  taken  at  time  t  +  r 
are  increasingly  irrelevant  as  r  increases,  and  in  fact  degrade  the  time 
frequency  picture.  With  further  analysis  it  was  found  that  the  “cut  and  slice” 
method  greatly  reduces  the  contribution  of  these  irrelevant  samples — reducing 
clutter  that  obscures  the  true  signal 

Finally,  all  of  the  modifications  were  coded  into  the  C  programming 
language  in  the  most  efficient  manner  as  possible  and  the  program  was  written 
such  that  it  could  be  easily  ported  to  a  pipelined  system  where  each  new  sample 
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produces  a  new  CWD.  The  new  code  was  run  using  the  Upperman  code  [5]  as  a 
benchmark.  The  new  code  ran  854  times  as  fast  as  the  old  code  when  run  using 
no  compiler  optimizations  and  147  times  faster  when  both  sets  of  code  were 
optimized  by  the  compiler.  The  optimized  version  of  the  code  was  able  to 
compute  the  CWD  for  a  512  sample  signal  in  46.6  ms  on  an  Intel  chip,  Linux 
based  PC. 

The  approximate  factor  of  6  increase  in  speed  improvement  for  the 
compiler  un-optimized  version  can  be  attributed  to  standard  computer 
engineering  coding  principles  such  as  loop  unrolling,  minimized  decision  making, 
and  the  reduction  of  redundant  computations.  The  new  code  is  highly  optimized 
to  be  ported  to  a  hardware  implementation  of  the  CWD,  as  well  as  a 
reconfigurable  computer  such  as  the  SRC-6. 

D.  THESIS  OUTLINE 

Chapter  II  of  this  thesis  provides  a  step  by  step  implementation  of  the 
Boasash  algorithm,  which  is  used  to  compute  the  CWD.  It  provides  a  breakdown 
of  the  equations  of  the  CWD,  and  provides  a  step  by  step  breakdown  of  how  the 
distribution  can  be  computed  using  the  Boasash  algorithm.  It  also  provides 
graphical  representations  of  the  affects  of  each  stage  of  the  computation,  providing  a 
clear  picture  of  what  is  happening  to  the  signal  at  each  stage  of  computation. 

Chapter  III  steps  through  several  modifications  that  can  be  made  to  the 
algorithm.  Each  section  shows  the  math  behind  the  modification,  how  each 
modification  reduces  the  number  of  computations  in  the  algorithm,  and  the  result  (if 
any)  on  the  final  result  of  the  distribution. 

Chapter  IV  examines  the  cumulative  effect  of  all  the  optimizations  on  the 
distribution,  documenting  both  the  error  produced  by  the  optimizations  and  the 
increase  in  processing  speed  achieved.  It  is  shown  that,  in  addition  to  the 
hundred-fold  increase  in  processing  speed  accomplished,  the  optimized 
computing  algorithm  actually  produces  a  more  accurate  representation  of  the  LPI 
signal.  Chapter  V  provides  the  conclusions  that  can  be  drawn  from  the  work 
accomplished  in  this  thesis. 
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II.  COMPUTATION  METHOD 


A.  CHOI-WILLIAMS  DISTRIBUTION 

The  CWD  is  one  of  a  class  of  time-frequency  distributions  introduced  by 

Cohen.  In  discrete  form,  the  CWD  can  be  expressed  as 

00 

CWD^{1,q})  =  2Y,  W(n)S(£, (2.1) 


where 


00 

S(i,n)=YnM) 

/J=~oo 


1 


(p-ef 

e  x{ju  +  n)x^{ju-n) 


(2.2) 


and  W{/u)\s  a  uniform  window  that  has  a  value  of  one  for  the  range  -MU 
through  M/2,  and  W{n)\s  a  uniform  window  that  has  a  value  of  one  for  the 
range  -N !  2  through  N 12  and  is  zero  elsewhere  [1]. 


The  above  equations  can  be  modified  in  order  to  tailor  them  to  allow  the 
use  of  a  standard  FFT  (Fast  Fourier  Transform)  in  calculating  the  distribution. 
For  the  work  described  in  this  thesis,  all  examples  will  use  data  sets  of  N 
samples  where  is  a  power  of  two.  Setting  the  summation  windows  over  n 
and  ju  to  go  from  -N /  2  to  N  /  2- and  having  i  realize  the  same  range,  will 
cover  all  non-zero  values  in  the  distribution  and  will  enable  the  use  of  the  FFT. 
This  will  be  clearer  in  the  following  analysis. 


Without  loss  of  generality,  the  N  samples  are  labeled  through 

0^  - 1^-  Equation  (2.2)  can  now  be  expressed  as 


S(£,n)=  ^  (p(ju,£,n)A[ju,n) 


(2.3) 


where 


A  Qx,  n^=  x(ju  +  n)x  *{ju-n) 


(2.4) 


7 


/  cr 

Note  that  (2.5)  is  undefined  for  n  =  0  .  Using  the  substitutions 


K  =  jU- 


4n/a 


d.K  =  ,  d/u 
Mn^lcj 


note  that 


,=-»  /<j 


Ann  (j 


r=-oo  \Ann^  /(J 


1 

1=2  J  dk 


Using  the  identity  [  dx  =  —  [12]: 

J  0 


-^2  f  e-'^"  dk  =  ^—  =  \ 
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This  shows  that  (2.5)  is  a  Gaussian  distribution  over  ju,  which  has  a  variance  of 
zero  for  ju  =  i  when  n  =  0  .  Using  the  sifting  property: 


(f>iju,i,n  =  0)  = 


1  //  = 
0  jUJ^ 


If  we  define  the  function  5"(^,«)to  be  (this  is  a  generalization  of  the 
equation  used  in  the  Boasash  algorithm), 

Si£,n)  0<n<^-l 
S'{£,n)  =  \  0  ^  = 


S{£,n-N)  ^/^  +  \<n<N-\ 


In  this  way,  S{l,n)  for  n  =  -^  through  ^-1  is  mapped  to  S'[£,n)  for  n  =  0 
through  A^-1.  So  (2.1)  becomes: 
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A^-1 

CfFD^ (£,(u)  =  2  J^S'(£, 

n=0 

Finally,  this  representation  can  be  modified  to  fit  the  standard  Fast  Fourier 
Transform  (FFT)  by  making  the  substitution:  co  =  thus  2.1  becomes 

7  A^— 1  I  ^ 

CWD^{i,'^)  =  2Y,S'(i,n)e'  ^  k  =  0,\,...,N (2.6) 

^  H=0 

Note  that  transforming  2.1  into  an  FFT  equation  results  in  frequencies  that 
are  double  what  they  initially  were.  This  effect  can  be  nulled  by  simply  halving 
the  values  of  the  frequency  axis  in  the  plotted  representations  of  the  signal. 

This  section  has  shown  how  the  discrete  CWD  is  modified  in  order  to 
compute  the  distribution  for  an  N  sample  window  using  an  FFT.  The  next 
section  breaks  down  the  Boasash  algorithm  into  steps  and  shows  the  results  of 
performing  each  step  on  an  example  input  signal. 

B.  COMPUTING  THE  DISTRIBUTION 

This  section  details  how  to  compute  the  CWD  using  the  equations  in  the 
previous  section  and  shows  an  example  of  the  resultant  output  for  a  Costas 
frequency  hopping  signal  with  a  signal  to  noise  ratio  (SNR)  of  0  dB  sampled 
W  =  5 12  times  (Figure  1 ).  The  Appendix  shows  the  details  for  the  signals  used  in 
this  thesis. 
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Figure  1 .  512  samples  of  Costas  signal  (real  portion). 

The  algorithm  used  to  compute  the  CWD  for  a  signal  sampled  N  times 
can  be  expressed  in  the  following  manner.  First  the  N  samples  are  stored  in  an 
array.  This  gives  us x( A:)  where 

\  sample  at  time  k  -  <k<  - 1 

x{k)  =  \  ^  /2  /2  (2.7) 

[  0  elsewhere 

Next,  the  function  can  be  written  to  an  N  by  N  array.  For 

illustration  purposes,  Figure  2  shows  an  example  of  how  A(^,n^  for  an  A^=8 
sample  signal  would  be  arrayed. 
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ld\n 

0 

1 

2 

3 

-4 

-3 

-2 

-1 

3 

x(3)-x(3)* 

x(4)-x(2)* 

x(5)-x(l)* 

x(6)-x(0)* 

x(-l)-x(7)* 

x(0)-x(6)* 

x(l)-x(5)* 

x(2)-x(4)* 

2 

x(2)-x(2)* 

x(3)-x(l)* 

x(4)-x(0)* 

x(5)-x(-l)* 

x(-2)-x(6)* 

x(-l)-x(5)* 

x(0)-x(4)* 

x(l)-x(3)* 

1 

x(l)-x(l)* 

x(2)-x(0)* 

x(3)-x(-l)* 

x(4)-x(-2)* 

x(-3)-x(5)* 

x(-2)-x(4)* 

X(-l)-x(3)* 

x(0)-x(2)* 

0 

x(0)-x(0)* 

x(l)-x(-l)* 

x(2)-x(-2)* 

x(3)-x(-3)* 

x(-4)-x(4)* 

x(-3)-x(3)* 

x(-2)-x(2)* 

x(-l)-x(l)* 

-1 

x(-l)-x(-l)* 

x(0)-x(-2)* 

x(l)-x(-3)* 

x(2)-x(-4)* 

x(-5)-x(3)* 

x(-4)-x(2)* 

x(-3)-x(l)* 

x(-2)-x(0)* 

-2 

x(-2)-x(-2)* 

x(-l)-x(-3)* 

x(0)-x(-4)* 

x(l)-x(-5)* 

x(-6)-x(2)* 

x(-5)-x(l)* 

x(-4)-x(0)* 

x(-3)-x(-l)* 

-3 

x(-3)-x(-3)* 

x(-2)-x(-4)* 

x(-l)-x(-5)* 

x(0)-x(-6)* 

x(-7)-x(l)* 

x(-6)-x(0)* 

x(-5)-x(-l)* 

x(-4)-x(-2)* 

-4 

x(-4)-x(-4)* 

x(-3)-x(-5)* 

x(-2)-x(-6)* 

x(-l)-x(-7)* 

x(-8)-x(0)* 

x(-7)-x(-l)* 

x(-6)-x(-2)* 

x(-5)-x(-3)* 

Figure  2.  for  iV  =  8 


Note  that  in  Figure  2,  several  of  the  boxes  are  grey.  These  boxes  are 
highlighted  because  one  or  both  terms  in  the  box  falls  outside  of  the  sample 
window  (x(-4)  through  x(3))  and  the  result  of  the  complex  multiply  will  be  zero. 

Figure  3  shows  the  absolute  values  of  what  becomes  of  our  Costas  signal 
once  it  has  been  processed  in  the  same  manner  and  mapped  into  a  512  by  512 
array.  Note,  that^(^,n)is  not  dependent  upon  the  parameter  1.  This  array  will 
be  used  for  all  subsequent  calculations  and  need  only  be  computed  once. 
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The  remaining  steps  in  the  process  of  computing  the  CWD  will  be 
repeated  for  all  i..  For  this  example,  -256<i<255 .  The  following  figures  are 
used  to  illustrate  the  algorithm  and  are  all  computed  with  ^  =  0 ,  a  time  in  the 
middle  of  the  sampled  signal. 

First,  a  matrix  is  created  to  represent  the  kernel  function  (j){^^,l,n)  shown 
in  (2.5).  Figure  4  shows  the  result  for  all  //  and  n,  with  cr  =  l,  and  f  =  0.  The 
value  of  cr  (sigma)  is  a  constant,  and  will  be  discussed  in  more  detail  later. 
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03&^-  "  I 

0.3<- I 
025^  ’  : 

5  0.2v.  '  i 


Figure  4.  Choi-Williams  kernel  function  l  =  Q,  N  =  5\2. 


Next,  the  array  is  weighted  (multiplied)  by  the  kernel  function,  producing 
an  array  that  represents  (//,«)  for  all  n  and  ju  for  the  current  iteration, 

of  ^ .  Figure  (5)  shows  the  kernel  function  for  ^  =  0 . 
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0.5 
0  4 

3 

tO.3 

j;o.2 

■^0.1 

0 


-256  0 

M-  n 

Figures.  ^[jU,i,n)A[ju,n)  for  £  =  0 . 

Next,  each  column  is  summed  producing  a  1  x  N  array  representing 


^-1 

S\i,n)=  Y,  (l>[ju,£,n)A[ju,n) 

for  i  =  0.  Figure  6  shows  the  S  \l,  n)  for  ^  =  0  (the  results  of  the  summation 
over  ju). 
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Finally,  an  FFT  is  used  to  compute  the  frequency  for  the  current  I  and 
saved  to  an  array  that  represents 

7lk  .  -jink— 

CWD^(£,—)  =  2j^S(£,n)e  ^ 

”  n=0 

for  the  current  £ .  Figure  7  shows  the  result  of  the  FFT  for  £  =  0.  Figure  8 
depicts  the  entire  time-frequency  distribution  for  all  £ .  The  red  line  shows  the 
proper  place  in  the  time-frequency  plot  for  thef  =  0  calculation.  Notice  that  the 
time-frequency  plot  shows  the  frequency  crossing  the  £  =  0  line  to  be  4000, 
which  reflects  the  frequency  shown  in  Figure  7.  The  time-frequency  plot  is  a  top 
down  view  of  the  results  for  all  £ ,  with  colors  mapped  so  that  high  returns  show 
in  red  and  low  returns  in  blue. 
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Figure  7.  The  (discrete  CWD  for  £  =  0 . 
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Figure  8.  Yellow  line  shows  location  of  CWD  for  i  =  Q  in  the  time-frequency 

(distribution. 
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When  this  algorithm  is  computed  for  all  I,  the  resulting  array  can  be 
plotted  representing  frequency  as  a  function  of  time.  The  resulting  plot,  Figure  9, 
provides  a  picture  of  the  signal  that  is  simple  to  analyze.  The  time-frequency  plot 
is  a  top  down  view  of  the  results  for  all  i ,  with  colors  mapped  so  that  high  returns 
show  in  red  and  low  returns  in  blue. 


Time  (s) 


Figure  9.  Plot  of  Choi-Williams  processed  Costas  signal,  SNR  =  OdB . 

The  algorithm  expressed  in  this  section  was  implemented  in  C  code  [4], 
[5].  The  average  time  to  complete  this  algorithm  for  an  N  =  5\2  sample  was 
approximately  46  seconds.  This  time  was  reduced  to  approximately  6  seconds 
using  compiler  optimizations.  Figure  10  shows  a  summary  of  this  algorithm.  The 
next  chapter  steps  through  the  optimizations  that  were  used  to  accomplish  this. 
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^  Magnrlu* 


The  signal  is  sampled  times. 


2.  A(^,n^  is  computed. 


3.  A{jj.,n^  is  weighted  by  the  kernel 
function. 


4.  The  result  is  summed  over  //. 


5.  Then  fed  through  an  FFT.  6.  Steps  3  through  5  are  computed  for 

all  1. 

Figure  10.  A  summary  of  the  steps  to  compute  the  CWD. 
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III.  OPTIMIZATIONS  FOR  FASTER  COMPUTATION 


A.  ELIMINATE  COMPUTATIONS  THAT  WILL  GIVE  ZERO  AS  A  RESULT 

Equation  (2.7)  states  that  all  values  for  x{k)  outside  the  range 

are  equal  to  zero.  The  resulting  array  in  Figure  11  is 

functionally  equivalent  to  the  array  in  Figure  2.  This  reduces  the  number  of 
computations  for  the  array  by  a  factor  of  2. 


ju\n 

0 

1 

2 

3 

-4 

-3 

-2 

-1 

3 

x(3)-x(3)* 

0 

0 

0 

0 

0 

0 

0 

2 

x(2)-x(2)* 

x(3)-x(l)* 

0 

0 

0 

0 

0 

x(l)-x(3)* 

1 

x(l)-x(l)* 

x(2)-x(0)* 

x(3)-x(-l)* 

0 

0 

0 

X(-l)-x(3)* 

x(0)-x(2)* 

0 

x(0)-x(0)* 

x(l)-x(-l)* 

x(2)-x(-2)* 

x(3)-x(-3)* 

0 

x(-3)-x(3)* 

x(-2)-x(2)* 

x(-l)-x(l)* 

-1 

x(-l)-x(-l)* 

x(0)-x(-2)* 

x(l)-x(-3)* 

x(2)-x(-4)* 

0 

x(-4)-x(2)* 

x(-3)-x(l)* 

x(-2)-x(0)* 

-2 

x(-2)-x(-2)* 

x(-l)-x(-3)* 

x(0)-x(-4)* 

0 

0 

0 

x(-4)-x(0)* 

x(-3)-x(-l)* 

-3 

x(-3)-x(-3)* 

x(-2)-x(-4)* 

0 

0 

0 

0 

0 

x(-4)-x(-2)* 

-4 

x(-4)-x(-4)* 

0 

0 

0 

0 

0 

0 

0 

Figure  11.  for  A^  =  8 . 
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B.  CONJUGATE  SYMMETRY  ADVANTAGE 


Analysis  shows  that  is  equal  to  the  conjugate  of 

for  Xj  =  ^  +  jB  and  x^=C  +  jD\ 

Xj  •  x;  =  (a  +  jB^  -  jD)=  (aC  +  BD)+  j{BC  -  AD) 
=  [{aC  +  BD)-  j{BC  -  Ad)[  =  [{a  -  y^Xc  +  Jd)^ 


or 


So 


Xi-X2  = 


(^2-<) 


A{/ii,n)  =  \^A{/u-n)\ 


for  all  .  Also,  note  that  the  kernel  function  is  strictly  symmetric  about 


V4^72^Ta 


(H-tf 

An^  la 


SO 


Therefore 


(l)[jU,i,n)A[ju,n)  =  \j)[jU,i,-n)A[ju,-n)\ 


Finally 


Z(/(oJ  =  Z(/(o) 


and 


S{l-n)  =  [S{l,n)'\ 


That  is, 


(3.1) 


(3.2) 


(3.3) 


(3.4) 
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and 


S\i,n)  =  \S\i,N-n)X 

,  (3.5) 

for  +  \<n<N-\ 

Since  the  array  for  S\i,n)  is  symmetrical  in  this  way,  only  the  values  of 


S\l,n)  for  0<n<^-l  need  be  computed.  Then,  the  conjugates  of  the 


function  for  \  <n<  - 1  can  be  reverse  ordered  and  used  as  the  results  of  the 

function  for  ^^  +  \<n<N This  reduces  the  calculations  for  S\i,n)  by 

another  factor  of  two  (for  large  values  of  N).  The  only  cost  in  computing  time  is 
the  need  to  change  the  sign  of  the  imaginary  portion  of  the  real  number  and  copy 
the  appropriate  values  into  the  array.  Figure  12  shows  how  A{ju,n)  maps  to 

[^(//,-n)]  . 


Figure  12.  for  iV  =  8 . 
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The  next  optimization  takes  further  advantage  of  the  symmetric  nature  of 
the  CWD  and  will  increase  the  speed  of  computation  with  no  deleterious  effects. 
Note  that  (3.5)  establishes  the  conjugate  symmetricity  of  the  function  that  is  input 
into  the  FFT; 

S\l,n)  =  [_S\l,N-n)^ 

for  + 1  <  n  <  - 1 .  Taking  a  closer  look  at  (2.6): 

Trk  .  -jink— 

CWDJ£,—)  =  2XS(£,n)e  ^ 

^  n=0 


the  summation  representing  the  Fourier  transform  of  the  input  array  may  be 
manipulated  as  follows.  The  summation  can  be  broken  into  two  parts. 


=  Y,S'(i,n)e 

n=0  n=0 


N 


Z,  j '2, 71  k 

S{l,n)e  ^ 


„=N/ 


Taking  further  advantage  of  3.5,  the  second  term  ends  up  being  the  conjugate  of 
the  first  term,  minus  the  result  of  the  function  for  n  =  0 . 


N-\ 


-j2nk^  ^-1 

N 


Z,  j27rk — 

S{l,n)e  ^ 

n=^/C+\ 


-ilTTk— 

=  0+  £  S\i,N-nye  " 


n=%+\ 


Using  the  substitution  n'  =  N-n 


^  1  ,  -jlnk^ 

^  S{i,N-nye  ^  S(i,n'ye  ^ 


=  ^S\l,nye 


-j(-2nk—+2nk)  ^ 


n=l 


r  *  j'2.7rk — 

=  ^S{£,nye  ^ 

n=\ 


or 


^  .  -J2nk7L 

Y,Si£,n)e  ^  = 


%-!  .  -J2nk- 

^Si£,n)e  ^ 

n=\ 
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Since  the  summing  of  a  complex  number  with  its  conjugate  will  equal  double  the 
real  portion  of  the  number  (A  +  A*  =2Re{A} )  and  5"(^,0)  will  always  be  a  real 
number,  the  total  summation  then  becomes: 

=2Re<  J  S' >-S'i£,0)  (3.6) 

n=0  n=0 

This  result  can  be  used  in  the  following  way.  A  new  function  5'"(^,n)can  be 
defined  as: 

'y2S\£,n)  «  =  0 

S^{£,n)  =  <  S\£,n)  1<«<^-1  (3.7) 

0  elsewhere 

The  CWD  then  becomes: 

Tlk  u  ~ 

CWD^{(,—)  =  ‘qi,^“(t,n)e  «  (3.8) 

”  n=0 

Using  this  equation,  the  exact  same  real  results  are  produced,  but  only  the  left 
half  of  the  input  array  need  be  computed.  The  imaginary  part  of  the  original 
equation  always,  by  definition,  summed  to  zero.  In  the  new  equation,  the 

imaginary  part  is  simply  discarded.  For  ^^<n<  N ,  all  input  values  into  the 

FFT  are  zero.  This  manipulation  was  verified  using  MATLAB  and  the 
programming  language  C.  The  algorithm  was  modified,  as  described  above,  and 
the  exact  same  results  were  obtained. 

The  next  section  discusses  the  “cut  and  slice”  optimization,  which 
eliminates  the  computation  of  near  zero  terms  stemming  from  very  small  values 
in  the  kernel  function. 
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C.  CUT  AND  SLICE 

All  optimizations  made  up  to  this  point  have  had  no  affect  on  the  final 
result  of  the  computation.  This  next  optimization  changes  the  final  result  of  the 
computation.  An  analysis  of  the  changes  affected  by  this  optimization  is  provided 
in  the  next  chapter. 

Another  way  to  vastly  improve  the  speed  with  which  the  distribution  can  be 
computed  is  to  take  advantage  of  the  properties  of  the  kernel.  Figures  13 
through  15  show  that  much  of  the  kernel  function  appears  to  be  near  zero. 


3 


Figure  13.  Choi-Williams  kernel  function  i  =  0,  N  =  5\2. 


-1 
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Figure  15.  Choi-Williams  kernel  function  £  =  0,  N  =  512. 
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Indeed,  looking  at  the  kernel  function  again, 

.  (p-ef 

(piju,i,n)=  ,  g 

/  cr 

it  can  be  seen  that  the  further  away  from  (//-^)  =  0  and  n  =  0,  the  closer  the 
kernel  function  gets  to  zero.  Since  any  number  multiplied  by  near  zero  will  have 
little  contribution  to  a  summing  operation,  it  can  be  reasoned  that  eliminating  the 
calculations  that  will  produce  near  zero  results  will  reduce  the  computation  time 
while  having  little  affect  on  the  final  result.  An  analysis  was  conducted  to 
subjectively  measure  the  degradation  in  results  of  reducing  near  zero  terms  and 
determine  a  threshold  that  does  not  degrade  the  results  noticeably.  A  subjective 
analysis  was  used  instead  of  a  deterministic  one  because  the  ability  of  this  signal 
processing  technique  to  provide  a  clear  picture  of  the  signal  cannot  be 
determined  by  any  function. 

To  conduct  this  analysis,  the  weighting  kernel  function  was  first  “cut.”  The 
variable  “cut”  was  defined  as  the  amount  of  columns  surrounding  the  peak  of  the 
kernel  function  that  were  used  in  the  computation.  For  example,  a  “cut”  of  256 
would  include  the  entire  512  columns.  A  cut  of  128  would  include  the  first  128 
columns  and  the  last  128  columns.  The  variable  “slice”  was  used  to  eliminate 
rows  above  and  below  the  peak  of  the  kernel  function.  A  “slice”  of  64  would 
include  the  row  with  the  peak  of  the  weighting  function  and  the  63  rows  both 
above  and  below  the  peak  of  the  kernel  function.  First,  the  kernel  was  cut  to  find 
a  threshold  where  the  picture  began  to  degrade.  As  can  be  seen  in  Figure  16, 
the  picture  is  degraded  noticeably  for  cut  <  32.  Note  that,  cut  =  256,  slice  =  512, 
is  identical  to  the  original  computation. 
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cut  =  256,  slice  =  512; 


cut  =  128;  slice  =  512 


cut  =  64,  slice  =  512; 


cut  =  32;  slice  =  512 


cut  =  16,  slice  =  512; 


cut  =  8;  slice  =  512 


Figure  16.  Evaluation  of  the  Cut  method  on  the  Costas  frequency  hopping 

signal. 
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Figure  17  shows  a  similar  analysis  for  “slice”.  The  subjective  threshold  for 
degradation  was  determined  visually  to  be  slice  =  32. 


cut  =  256  slice  =  512 


cut  =  256  slice  =  256 


cut  =  256  slice  =  32 


cut  =  256  slice  =16 


Figure  17.  Evaluation  of  the  Slice  method  on  the  Costas  frequency  hopping 

signal. 
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Next,  cut  and  slice  were  conducted  at  the  two  thresholds  at  the  same  time 
to  verify  that  the  effects  of  the  two  optimizations  do  not  have  a  multiplicative 
affect.  Figure  18  shows  that  reducing  the  computations  to  thresholds  of  32  rows 
and  32  columns  surrounding  the  peak  of  the  kernel  function  produces  an  image 
that  is  sufficiently  similar  to  the  full  version  as  to  allow  the  picture  of  the  signal  to 
still  be  readable. 


Cut  =  32  Slice  =  32 


Cut  =  256  Slice  =  512  (Full  Version) 


Figure  18.  Comparison  of  reduced  computation  and  full  computation  results. 
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Figure  18  shows  the  kernel  function  that  has  been  cut  and  sliced.  Since 
all  rows  and  columns  more  than  32  away  from  the  peak  of  the  function  are  zero, 

and  therefore  produce  a  result  of  zero  when  multiplied  by  ^ these  products 

are  simply  not  computed.  In  Figure  19,  cut  is  the  shorter  axis,  slice  is  the  long 
axis.  Slice  =  32  means  32  to  either  side  of  0. 


Figure  19.  The  kernel  function  with  Cut  =  32,  Slice  =  32. 


The  next  section  shows  how  the  FFT  algorithm  can  be 
advantage  of  all  the  zeroed  values  within  the  kernel  function. 


modified  to  take 
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D.  REDUCED  FFT 

The  next  optimization  takes  advantage  of  all  the  zeros  produced  by  the 
previous  optimizations.  This  optimization  increases  the  speed  of  the  FFT 

computation  by  Since,  without  this  optimization,  the  FFT  accounts  for 

roughly  half  of  the  total  computation  time  for  the  CWD,  this  is  significant 
improvement. 

Figure  20  shows  a  common  way  to  implement  an  FFT  using  log^n  layers 

of  BFMs  to  implement  an  n  point  FFT.  Figure  20  shows  an  eight  point  FFT 
implemented  with  three  layers  of  BFMs. 


In  Figure  20,  arrows  joining  at  a  circle  indicate  that  two  values  are  summed.  The 
various  weighting  terms  shown  represent  the  twiddle  factors  that  are  multiplied 
where  =Qx^[-j27rklN) .  The  negative  ones  indicate  that  the  arrow  is 

multiplied  by  a  negative  one.  Figure  21  shows  how  the  algorithm  is  altered  when 
all  but  the  first  two  inputs  (x(0)  and  x(1))  are  made  equal  to  zero. 


31 


Figure  21.  Modified  FFT  [modified  From  13]. 


Notice  that  the  first  two  layers  of  BFMs  can  be  completely  skipped  and  the  inputs 
can  be  fed  directly  into  the  third  layer.  This  same  affect  works  on  FFTs  of  this 
type  for  all  sizes  with  the  following  conditions  and  effects: 


1 .  For  any  FFT  calculated  for  n  inputs  where  log2 « is  a  whole  number. 

2.  If  every  input  following  the  first  m  values  is  equal  to  0. 

3.  And  logj/n  is  a  whole  number. 

(  n  ^ 

4.  Then  the  m  inputs  can  be  fed  directly  into  the  logj  — +  1  \  th  BFM  layer  of 

'v  m  } 

the  n  point  FFT. 


fl 

5.  Each  input  will  be  fed  into  —adjacent  inputs. 

m 

6.  And  the  order  of  the  inputs  is  in  the  order  of  the  values  represented  by  the 
reversed  order  bit  representations  of  the  values  zero  through 
m-\ represented  in  log2 m  bits. 
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For  the  example  above: 


1. 

2. 

3. 

4. 


n  =  8  logj  n  =  3 

Every  input  past  the  first  two  inputs  is  zero. 

m  =  2  logj  m  =  \ 


The  first  two  values  are  fed  directly  into  the  log^ 
point  FFT. 


g 

-  + 1  =  3  rd  layer  of  the  8- 


g 

5.  Each  input  is  fed  into  -  =  4  adjacent  inputs. 

6.  The  values  zero  through  2-1  =  1  are  represented  inlog22  =  l  bits.  A  bit 

reversal  is  conducted  (trivial  in  this  case),  and  the  order  of  input  is 
established  by  these  reverse  bit  values  (again  in  this  case  no  reordering  is 
required). 


For  the  purposes  of  implementing  the  above  optimization  for  our  512  input  FFT 
with  a  “cut”  of  32,  limiting  the  input  to  only  32  values: 

1.  n  =  5\2  \og^n  =  9 

2.  Every  input  past  the  first  32  inputs  is  zero. 

3.  m  =  32  logj  m  =  5 

512 

4.  The  first  32  values  are  fed  directly  into  the  log^  - 1  =  5r/z  layer  of  a  512 
point  FFT. 

512 

5.  Each  input  is  fed  into  =  16  adjacent  inputs. 

6.  The  values  zero  though  31  are  represented  in  5  bits.  A  bit  reversal  is 
conducted  and  the  order  of  inputs  is  established  by  these  reverse  bit 
values  (see  Table  2). 
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0-31 

Reversed 

Order  of 

bit 

inputs 

0 

00000 

0 

1 

10000 

16 

2 

01000 

8 

3 

11000 

24 

4 

00100 

4 

31 

11111 

31 

Table  2.  Order  of  inputs. 


The  next  section  shows  an  optimization  that  might  be  beneficial  if  the 
CWD  were  computed  using  a  reconfigurable  computer. 

E.  AN  OPTIMIZATION  NOT  REALIZED 

The  next  optimization  was  not  realized,  due  to  the  fact  that  implementation 
using  C  code  would  probably  provide  a  negligible,  if  any,  increase  in  speed  of 
computation.  It  is,  however,  an  interesting  optimization  in  that  it  uses  powers  of 
two,  instead  of  the  exponential  function,  making  it  a  potentially  powerful 
optimization  to  use  with  reconfigurable  computers.  Also,  this  section  takes  a 
closer  look  at  the  mysterious  parameter  known  only  as  “sigma”. 

From  (2.5)  we  start  with  (l){ju,l,n) : 


(j)iyjj,,l,n)=  ,  ^=e 

/  cr 


Using  the  identity 


the  following  simplification  can  be  made 
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I  —  a  ^ 

\l - 

V  47rn  n 

then  using  the  identity 

log,  /  =  2log,J 

the  following  simplification  can  be  made 


1  cr 


log; 


V  y 


1  (M-t)  O',  /  ^ 

1  ^  log;(e) 


n\47r 


4n^ 


n\  47r 


then  using  the  identity 


log^j) 


_log^ 


logX-^) 

the  same  equation  can  be  represented  as  follows: 


1  I  i/J-f)  CTln(e) 

n\4n  n\47T: 


or  finally 


(l){ju,i,n)  = 


in~(r 

An^  la 


=  -/2 
n\4n 


(7 

41n(2) 
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Before  proceeding  further,  let  us  take  a  closer  look  at  a .  Sigma  changes 
the  distribution  of  the  kernel  function.  Up  until  this  point,  all  computations  in  this 
thesis  have  been  done  with  cr=l.  Figures  22  through  29  represent  the  kernel 
function  for  various  values  of  cr  and  its  affect  upon  the  processing  result  of  the 
CWD,  using  our  running  example  of  a  Costas  frequency  hopping  signal  with 
SNR  =  0dB. 


Figure  22.  cr  =  0.01. 


Figure  23.  cr  =  0.1. 
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Figure  24.  cr=l. 


Figure  25.  c  =  2.77258872224  =  41n(2) . 


Figure  26.  cr  =  10. 
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9' 

9  * 


^  I ' 


M  o 


Figure  27. 


no 

1  « 


i  * 


Figure  28. 


S 

20 


Figure  29. 


TtiwdI 


cr  =  100. 


Ttn*  (•) 


o-  =  1000. 


T»n#(*) 


o-  =  10,000. 
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Notice  that  in  Figure  24  for  cr  =  2.77258872224  =  4 ln(2)  (a  constant),  the 
signal  representation  is  good  and  the  kernel  function  becomes: 

=1  ^2-^^  =  0.220635600153  ^3  g, 

n\  4/1  n 

Note  that  (3.9)  provides  the  mathematically  same  results  as  (2.5)  and  Figure  25 
with  cr  =  2.77258872224,  using  the  original  kernel  function: 


/  cr 

Figure  30  shows  that  if  we  round  all  values  of  the  exponent  term, 

,  to  the  nearest  integer,  we  get  the  following  kernel  function  and 

results  for  our  running  example  of  the  Costas  signal.  Figure  31  shows  a  close-up 
of  the  modified  kernel  function  with  its  exponent  rounded.  Notice  that  the  plot  of 
the  rounded  kernel  function  is  not  as  smooth  as  the  unrounded  kernel  function, 
but  that  it  still  retains  the  same  overall  shape  and  characteristics. 


Figure  30.  Rounded  kernel  function,  cr  =  41n(2). 
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Figure  31.  Comparison  of  the  rounded  kernel  function,  cr=  4  ln(2),  with  the 

unrounded  kernel  function. 


Although  the  effect  of  rounding  the  kernel  function  to  the  nearest  power  or 
two  is  noticeable,  the  effects  upon  the  output  distribution  are  negligible.  Figure 
32  compares  the  two  outputs  generated  for  Figures  25  and  30.  It  is  difficult  to  tell 
the  difference  between  the  two. 


Figure  32.  Comparing  rounded  to  unrounded  version,  cr  =  41n(2). 
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If  the  new  rounded  version  is  used,  for  every  S\l,n)  computed, 
S'[l  +  \,n)  =  S'[l,n)2'‘  where  a  is  some  integer  (equally  applicable  to 
[l  +  \,n)  =  S" ).  If  IEEE  floating  point  is  used  to  represent  the  number, 
then^(//,n)  need  only  be  weighted  once  by  multiplying  it  by  a  kernel  function. 

Since  the  rest  of  the  weighting  kernel  functions  in  a  column  are  related  by  some 
power  of  two,  the  mantissa  of  the  floating  point  representation  will  remain  the 
same  for  all  other  weights.  The  other  weights  can  be  calculated  by  adding  or 
subtracting  to  the  eight-bit  exponent  of  the  floating  point  representation.  Since 
there  is  no  such  single  operation  in  the  C  programming  language,  to  utilize  this 
method  would  involve  masking  out  the  eight-bit  exponent,  modifying  it,  and  then 
placing  it  back  into  the  number.  The  benefits  of  this  multiple  instruction  operation 
over  simply  doing  the  multiplications  are  questionable.  However,  if  a 
reconfigurable  computer  is  used  to  perform  the  operation,  there  is  the  potential 
for  substantial  savings  in  computation  time. 

This  is  the  last  optimization  examined.  The  next  chapter  takes  a  closer 
look  at  the  effects  of  the  cut  and  slice  optimization,  detailed  in  Section  C  of  this 
chapter,  and  shows  the  overall  improvement  in  computation  speed  attained  by 
utilizing  the  optimizations. 
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IV.  ERROR  AND  TIMING  ANALYSIS 


A.  THE  (NOT  SO)  DELETERIOUS  EFFECTS  OF  CUT  AND  SLICE 

In  establishing  the  threshold  for  the  “cut  and  slice”  optimization,  the  author 
used  a  subjective  approach  to  determine  how  small  the  kernel  function  could  be 
cut  and  sliced  without  degrading  the  quality  of  the  results.  This  section  conducts 
an  objective  analysis  to  show  the  effect  of  the  cut  and  slice  operation.  Also,  this 
section  shows  the  effects  of  the  optimization  on  some  common  LPI  signals  to 
determine  the  effect  of  the  optimization  on  a  range  of  signals  at  various  signal  to 
noise  (SNR)  ratios. 

Five  common  LPI  signals  are  analyzed — each  one  at  three  different  levels 
of  SNR:  signal  only  (SNR  =  00  dB),  SNR  =  OdB  and  SNR  =  -6dB.  Each  of  the 
15  example  signals  were  measured  at  cut  =  slice  =  128,  64,  32,  16,  and  8  (see 
cut  and  slice  section  in  the  previous  chapter  for  explanation  of  levels)  and 
compared  to  the  results  of  the  unaltered  version  of  the  CWD  of  the  same  signal. 


Error  =  median 


CWD  -  CS 
CWD 


(4.1) 


To  compare  each  computation  result  to  the  original  version,  the  absolute 
value  of  the  difference  between  the  cut  and  slice  (CS)  version  and  the  original 
computation  are  measured  at  each  of  the  512  x  512  points  in  the  array,  and  then 
they  are  divided  by  the  average  value  of  all  the  points  in  the  original  array.  Then 
the  median  is  taken  (4.1). 
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CWD 

128 

64 

32 

16 

8 

Figure  33.  Figure  Map 


The  results  of  the  analysis  are  summarized  in  Figures  34  through  48 
below.  In  each  of  the  figures,  there  are  six  plots.  The  top  left  plot  shows  the 
result  of  the  unmodified  version  of  the  Choi-Williams  computation.  The  plot  to  its 
right  is  the  result  of  the  computation  with  cut  =  slice  =128,  which  is  the  number  of 
columns  and  rows  surrounding  the  peak  of  the  kernel  function,  (l)[ju  =  l,n  =  Q) . 

The  plot  below  the  original  is  for  cut  =  slice  =  64,  and  to  that  one’s  right  cut  = 
slice  =  32.  The  two  plots  on  the  bottom  are  for  cut  =  slice  =  16,  and  cut  =  slice  = 
8.  Figure  33  above  summarizes  the  layout.  At  the  bottom  of  each  set  of  time- 
frequency  plots  is  a  stem  plot  of  the  Error  measured  for  each  of  the  cut  and  slice 
computations  compared  to  the  original. 

Note  that  cut  =  slice  =  32  was  the  threshold  established  for  optimizing  the 
computation  using  the  subjective  approach  and  is  the  chosen  threshold  to  use  for 
the  optimized  version  of  the  algorithm. 

The  five  signals  examined  are;  Frequency  Modulated  Carrier  Wave 
(FMCW),  Polyphase  1  (PI),  Polytime  1  (PT1),  Costas  Frequency  Flopping 
(Costas),  and  Frequency  Shift  Keying/Phase  Shift  Keying  (FSK/PSK). 
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Figure  34.  Cut  and  Slice  results  for  FMCW,  signal  only,  including  a  stem  plot 
of  the  error  when  compared  to  the  original  (top  left). 
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Figure  35.  Cut  and  Slice  results  for  FMCW,  0  dB ,  including  a  stem  plot  of  the 

error  when  compared  to  the  original  (top  left). 
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Figure  36.  Cut  and  Slice  results  for  FMCW,  -6  dB,  including  a  stem  plot  of  the 

error  when  compared  to  the  original  (top  left). 
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Figure  37.  Cut  and  Slice  results  for  P1 ,  signal  only,  including  a  stem  plot  of  the 

error  when  compared  to  the  original  (top  left). 
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Figure  38.  Cut  and  Slice  results  for  P1 ,  0  dB ,  including  a  stem  plot  of  the  error 

when  compared  to  the  original  (top  left). 
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Figure  39.  Cut  and  Slice  results  for  P1 ,  -6  dB ,  including  a  stem  plot  of  the 
error  when  compared  to  the  original  (top  left). 
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Figure  40.  Cut  and  Slice  results  for  PT1,  signal  only,  including  a  stem  plot  of 
the  error  when  compared  to  the  original  (top  left). 
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Figure  41 .  Cut  and  Slice  results  for  PT1 ,  0  dB ,  including  a  stem  plot  of  the 
error  when  compared  to  the  original  (top  left). 
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Figure  42.  Cut  and  Slice  results  for  PT1,  -6  dB,  including  a  stem  plot  of  the 
error  when  compared  to  the  original  (top  left). 
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Figure  43.  Cut  and  Slice  results  for  Costas,  signal  only,  including  a  stem  plot 
of  the  error  when  compared  to  the  original  (top  left). 
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Figure  44.  Cut  and  Slice  results  for  Costas,  0  dB ,  including  a  stem  plot  of  the 

error  when  compared  to  the  original  (top  left). 
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Figure  45.  Cut  and  Slice  results  for  Costas,  -6  dB,  including  a  stem  plot  of  the 

error  when  compared  to  the  original  (top  left). 
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Figure  46.  Cut  and  Slice  results  for  FSK/PSK,  signal  only,  including  a  stem 
plot  of  the  error  when  compared  to  the  original  (top  left). 
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Figure  47.  Cut  and  Slice  results  for  FSK/PSK,  0  dB ,  including  a  stem  plot  of 
the  error  when  compared  to  the  original  (top  left). 
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Figure  48.  Cut  and  Slice  results  for  FSK/PSK,  -6  dB,  including  a  stem  plot  of 
the  error  when  compared  to  the  original  (top  left). 
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As  would  be  expected,  in  all  cases  in  the  above  error  plots,  the  more  the 
kernel  function  was  cut  and  sliced  down,  the  more  error  was  introduced  (using 
the  original  distribution  as  the  benchmark).  Also,  as  would  be  hoped,  the 
distribution  for  cut  =  slice  =  32  picture  of  the  signal  was  not  degraded 
significantly.  Interestingly,  in  many  cases,  the  picture  actually  looks  clearer.  For 
example,  note  in  Figure  49  that  for  the  FMCW,  signal  only,  the  optimized  version 
(for  the  rest  of  this  thesis,  optimized  refers  to  the  cut  =  slice  =  32  version,  and 
original  refers  to  the  un-optimized  version  of  the  distribution)  looks  much  clearer 
than  the  original.  If  this  is  a  signal  only  representation,  why  are  there  horizontal 
lines  across  the  picture? 


Figure  49.  Original  vs.  Optimized  FMCW,  signal  only. 


A  horizontal  line  across  the  plot  indicates  that  that  frequency  is  present 
throughout  the  entire  sample.  Figure  50  shows  the  kernel  function  for  times 
£  =  0  and  £  =  200 .  Note,  that  for  low  values  of  n  (the  columns  on  the  left  side), 
the  Gaussian  distribution  has  a  low  variance,  but  that  at  high  values  of  n ,  the 
Gaussian  distribution  quickly  approaches  being  completely  flat,  giving  near  equal 
representation  to  all  values  in  the  column. 
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Figure  50.  (I)[ju,l,n)  for  I  =  0  and  I  =  200 . 


Figure  51  shows  that  A(^,n^  (for  N  =  S),  it  can  also  be  seen  that  for  higher 

values  of  n ,  the  column  represents  data  points  multiplied  together  that  are  more 
distant  from  each  other  in  time. 
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Figure  51.  for  N  =  S . 
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Therefore,  for  higher  instances  of  n ,  the  summation  over  //  will  provide 
near  equal  representation  for  all  samples  multiplied  by  another  sample  that  is  2n 
in  distance  away — for  all  cases  of  i . 

Figure  52  shows  the  original  FMCW,  signal  only,  plot  with  the  distributions 
for  i  =  0  and  ^  =  200  highlighted  in  yellow.  Figure  53  shows  for  £  =  0 

and  £  =  200 .  Figure  54  shows  a  close  up  of  the  same  plot  for  200  <n<  250 . 
Figure  55  shows  CWD  for  I  =  0  and  £  =  200  . 


0  0  01  002  OOS  004  OOS  0  06  0  07 

Time  (b) 

Figure  52.  Flighlighted  original  FMCW,  signal  only. 


Figure  53.  S“{£,n)for  £  =  0  and  £  =  200  for  unaltered  CWD. 
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Figure  54.  Close  up  of  5'"(£,n)for  i  =  0  and  £  =  200  for  unaltered  CWD. 


Figure  55.  CWD^(i,co)  for  I  =  0  and  i  =  200  for  unaltered  CWD. 
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Note,  that  as  expected  from  the  previous  analysis,  S“(i,n)  produces 
nearly  identical  results  for  high  values  of  n  and  highly  disparate  values  of  £,  and 
that  these  results  are  derived  from  samples  multiplied  by  other  samples  that  are 
2n  in  distance  away  from  each  other.  This  is  certainly  not  desired  and  is  the 
cause  of  the  horizontal  lines  across  the  original  distribution.  Figures  56  through 
59  below  conduct  the  same  analysis  for  the  optimized  version  of  the  distribution. 

Figure  56  shows  the  optimized  kernel  function  for  times  i  =  0  and  £  =  200  . 
Figure  57  shows  the  optimized  CWD  for  the  FMCW,  signal  only,  plot  with  the 
distributions  for  £  =  0  and  £  =  200  highlighted  in  yellow.  Figure  58  shows 
S^{£,n)  for  £  =  0  and  1  =  200  .  Finally,  Figure  59  shows  the  optimized  CWD  for 
£  =  0  and  ^  =  200. 


Figure  56.  ^[ju,£,n)  for  I  =  0  and  £  =  200  ,  cut  =  slice  =  32. 


64 


3^(Iro  W} 


1500 


lOX) 


1300 


1200 


17  11X 

g  1000 

$ 

cr 

e  900 


800 


700 


eoo 


500 

0  0  01  002  0  03  0  04  0  05  0.06  0  07 

Time  (s) 


Figure  57.  Highlighted  optimized  FMCW,  signal  only. 


Figure  58.  5'"(£,n)for  l  =  Q  and  I  =  200 ,  cut  =  slice  =  32. 
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Figure  59.  CWD^{i,o})  for  £  =  0  and  i  =  200 ,  cut  =  slice  =  32. 

Figure  60  shows  three-dimensional  representations  of  the  CWD  for  both 
the  original  and  optimized  versions. 
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Figure  60.  Original  and  Optimized  FMCW,  signal  only. 
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It  can  be  concluded  from  this  analysis  that  not  only  is  the  optimized 
version  faster  and  better  looking,  it  portrays  a  more  accurate  depiction  of  the 
signal  as  well. 

B.  TIMING  RESULTS 

Coding  of  the  optimized  version  of  the  algorithm  was  conducted 
incrementally.  As  each  optimization  was  developed,  it  was  verified  using 
MATLAB  to  empirically  determine  that  the  math  was  correct,  and  then  the 
optimizations  were  applied  to  the  code  in  [5].  In  comparison  to  the  compiler  un¬ 
optimized  version  of  the  original  code,  using  the  symmetry  optimization  yielded 
an  approximately  tenfold  increase  in  speed.  The  cut  and  slice  optimization 
yielded  another  approximately  tenfold  increase  in  speed.  With  the  permission  of 
Professor  Breitenbach,  the  recursive  FFT  function  authored  by  Professor 
Breitenbach  used  in  the  original  code  was  unrolled,  placed  within  the  code  itself 
and  optimized  in  the  same  manner  as  described  in  the  FFT  optimization  section 
(Chapter  II),  increasing  the  speed  of  computation  by  approximately  another 
twofold.  Other  speed  increases  were  realized  by  reducing  redundant 
computations.  For  instance,  in  the  original  code,  the  kernel  function  is  essentially 
recalculated  N  times.  In  the  new  code,  the  windowed  (cut  and  sliced)  kernel 
function  is  pre-calculated  before  the  timing  starts.  The  original  code  was  also  de- 
parameterized  (the  original  was  built  to  perform  the  distribution  for  any  N  where 
is  a  power  of  two)  in  order  to  wring  out  every  possible  speed  increase. 

Finally,  it  made  sense  to  pipeline  the  code.  In  the  original  algorithm,  it  was 
necessary  to  use  all  data  samples  for  each  and  every  iteration  of  i ,  whereas  for 
the  optimized  version,  each  iteration  depends  only  on  samples  near  in  time  to  i . 
In  the  final  pipelined  version,  the  arrays  are  preloaded  with  the  samples  needed 
to  compute  the  first  computation  (^  =  -256),  a  new  sample  is  written  into  the 
array  over  the  oldest  sample  in  the  array,  and  the  computation  is  redone.  The 
priming  of  the  pipeline  before  timing  starts  is  cheating  a  little  bit.  However,  it 
should  only  account  for  1/ 512  th  of  the  speed  increase  over  the  original  code. 
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The  pipelined  version  was  developed  with  real,  pipelined  hardware  in  mind.  The 
final  code  should  be  easy  to  port  to  the  SRC-6  supercomputer  or  other  high 
performance  computing  hardware. 

To  compare  the  final  code  (pipe.c)  to  the  original  code  (choi.c  from 
[5])  five  test  runs  were  conducted  for  both  the  compiler  un-optimized  and 
compiler-optimized  versions  of  both  programs.  The  compiler  used  was  icc,  and 
the  optimizations  used  were: 

-03  -tpp7  -xW  -align  -Zp16  -ipo  -static 

The  trial  runs  were  conducted  on  an  Intel  chip,  Linux  based  PC.  Table  3  shows 
the  results. 


From 

[5] 

From  this  work 

choi .  c 

choi .  c 

(compiler 

optimized) 

pipe .  c 

pipe .  c 

(compiler 

optimized) 

Trail  1 

46.81 

6.860 

0.05442 

0.04699 

Trial  2 

46.51 

6.887 

0.05445 

0.04655 

Trial  3 

46.50 

6.852 

0.05457 

0.04640 

Trial  4 

46.23 

6.872 

0.05470 

0.04631 

Trial  5 

46.49 

6.824 

0.05426 

0.04661 

Average 

46.51 

6.859 

.05448 

.046572 

Table  3.  Time  in  seconds  of  trial  runs. 


The  optimized  version  of  the  code  produced  an  854X  increase  in  speed 
over  the  original  code.  The  optimized  version  of  the  code  compiled  with  an 
optimizing  compiler  produced  a  147X  increase  in  speed  over  the  compiler 
original  version  of  the  code  compiled  with  an  optimizing  compiler. 
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V.  CONCLUSION 


A.  BACKGROUND 

The  objective  of  this  thesis  was  to  improve  the  speed  at  which  the  CWD 
could  be  computed  on  the  SRC-6  reconfigurable  supercomputer.  Optimizing  the 
algorithm,  prior  to  porting  the  code  to  the  SRC-6,  yielded  enough  work  in  both 
quantity  and  level  of  difficulty  to  stand  alone  as  the  subject  of  this  thesis.  Also, 
the  optimizations  developed  in  this  thesis  are  applicable  to  any  implementation  of 
the  CWD. 

B.  RESULTS 

By  exploiting  the  symmetry  of  the  CWD  and  eliminating  the  computation  of 
near  zero  terms,  dramatic  gains  in  computation  speed  were  achieved.  Further 
gains  were  achieved  by  modifying  the  FFT  to  take  advantage  of  zero  terms.  The 
optimizations  altered  the  results  of  the  time-frequency  distribution;  however,  the 
altered  results  yield  a  more  accurate  time-frequency  representation  of  the  signal. 
The  optimized  algorithm  developed  in  this  thesis  is  a  significant  step  towards 
developing  a  system  that  can  identify  and  classify  LPI  signals  in  real  time.  This 
algorithm  is  not  platform  specific,  since  developed  using  pure  Mathematics.  It 
can  be  used  to  realize  faster  Choi-Williams  calculations,  with  a  better  result,  on 
any  platform. 

C.  RECOMMENDATIONS  FOR  FUTURE  WORK 

It  is  recommended  that  the  effort  to  port  this  algorithm  to  the  SRC-6  or 
some  other  FPGA  realization  be  continued.  Particularly,  the  “Optimization  Not 
Realized”  from  Chapter  III  should  yield  a  significant  increase  in  speed  if 
implemented  using  an  FPGA. 
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APPENDIX.  LPI  SIGNAL  GENERATION 


The  example  signals  used  for  this  thesis  were  taken  from  reference  [14], 
There  were  several  signals  generated  by  the  LPI  Toolbox  used  in  this  thesis. 
Each  signal  is  described  in  detail  below.  Each  signal  was  generated  with  a  signal 
to  noise  ratio  of  0  dB.  The  addition  of  noise  generates  more  realistic  results 
without  overwhelming  the  graph. 

1)  F_1_7_500_30_s.mat:  FMCW  signal  with  a  oo  dB  SNR 

2)  F_1_7_500_30_0.mat:  FMCW  signal  with  a  0  dB  SNR 

3)  F_1_7_500_30_-6.mat:  FMCW  signal  with  a  -6  dB  SNR 

4)  P1_1_7_8_1_s.mat:  P1  signal  with  a  oo  dB  SNR 

5)  P1_1_7_8_1_0.mat:  P1  signal  with  a  0  dB  SNR 

6)  P1_1_7_8_1_-6.mat:  P1  signal  with  a  -6  dB  SNR 

7)  PT1_1_7_2_4_s.mat:  PT1  signal  with  a  oo  dB  SNR 

8)  PT1_1_7_2_4_0.mat:  PT1  signal  with  a  0  dB  SNR 

9)  PT1_1_7_2_4_-6.mat:  PT1  signal  with  a  -6  dB  SNR 

10)  C_1_15_5000_s.mat:  Costas  signal  with  a  oo  dB  SNR 

11)  C_1_15_5000_0.mat:  Costas  signal  with  a  0  dB  SNR 

12)  C_1_15_5000_-6.mat:  Costas  signal  with  a  -6  dB  SNR 

13)  FSK_PSK_Costas_5_s.mat:  FSK/PSK  signal  with  a  oo  dB  SNR 

14)  FSK_PSK_Costas_5_0.mat:  FSK/PSK  signal  with  a  0  dB  SNR 

15)  FSK_PSK_Costas_5_-6.mat:  FSK/PSK  signal  with  a  -6  dB  SNR 

More  information  on  the  use  of  the  LPI  Toolbox  and  the  different  LPI  signals  is 
given  in  [4]. 
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To  use  these  signals  in  the  C  programming  environment,  they  were  converted  to 
text  files  using  the  following  code  [5],  [14], 


9-9-  9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9-  TMrTAT  9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9-  9-9- 

OO  OOOOOOOOOOOOOOOOOOOOO  r  oUdc:  ooooooooooooooooooo  oo 

load  H:\Thesis\Choi\Test_signals\F_l_7_250  20_0.mat 
sigl  =  [I  Q] ; 

save  S:\thesis\Test_signals_txt\F_l_7_250_20_0.txt  sigl  -ascii  -double 

9-9-  9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9-  T?-k'=it-iV  9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9-  9-9- 

oo  OOOOOOOOOOOOOOOOOOOOO  r  Xdiijv  ooooooooooooooooooo  oo 

load  H: \Thesis\Choi\Test_signals\FR_l_7_4_l_0 .mat 
sig2  =  [I  Q]  ; 

save  S:\thesis\Test_signals_txt\FR_l_7_4_l_0.txt  sig2  -ascii  -double 


9-9-  9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9-  Pr^c:-|-;=iQ  9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9- 9-  9-9- 

oo  ooooooooooooooooooo  '^UoLdo  '^CJU.c:  ooooooooooooooooooo  oo 

load  H ; \Thesis\Choi\Test_signals\C_l_15_5000_0 .mat 
sig3  =  [I  Q] ; 

save  S:\thesis\Test_signals_txt\C_l_15_5000_0.txt  sig3  -ascii  -double 


FSK/PSK  Costas  Code 
load  H:\Thesis\Choi\Test_signals\FSK  PSK  Costas_15_5_0 .mat 
sig4  =  [I  Q] ; 

save  S:\thesis\Test_signals_txt\FSK_PSK_Costas_15_5_0.txt  sig4  -ascii 
-double 
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